A Chart-based Method of ID/LP Parsing with Generalized Discrimination Networks
نویسندگان
چکیده
Variations of word order are among the most well-known phenomena of natural languages. From st well represented sample of world languages, Steele[13] shows that about 76% of languages exhibit significant word order variation. In addition to the wellknown Walpiri(Australian language), several languages such as Japanese, Thai, German, Hindi, and Finnish also allow considerable word order variations. It is widely admitted that such variations are" governed by generalizations that should be expressed by the grammars. Generalized Phrase Structure Grammar (GPSG)[7] provides a method to account for these generalizations by decomposing the grammar rules to Immediate Dominance(ID) rules and Linear Preeedence(LP) rules. Using ID/LP formalism, the flexible word order languages can be concisely and more easily described. However, designing an efficient algorithm to pnt the seperated components back in real parsing is a difficult problem. Given a set of ID/LP rules, one alternative method for parsing is to compile it into another grammar description language, e.g. Context-Free Grammar(CFG), for which there exist some parsing algorithms. However, the received object grammar tends to be so huge and can slow down the parsing time dramatically. Also, the method losts the modularity of ID/LP formalism. Another set of approaches[ll, 4, 11 tries to keep ID and LP rules as they are, without expanding them out to other formalisms. Shieber[ll] has proposed an interesting algorithm for direct ID/LP parsing by generalizing Earley's algorithm[6] to use tile constraints of ID/LP rules directly. Despite of its possibility of blowing up in the worst ease, Barton[3] has shown that Shieber's direct parsing algorithm usually does have a time advantage over the use of Earley's algoo rithm oll the expanded CFG. Thus the direct parsing strategy is likely to be an appealing candidate for parsing with ID/LP rules from the computational point of view. In this paper, we present a new approach to direct ID/LP rules parsing that outperforms the prcvious methods. Besides of the direct parsing property, three features contribute to its efficiency. First, ID rules are precompiled to generalized discrimination networks[9] to yield compact representation of parsing states, hence less computation time. Second, LP rules are also precompiled into a Hasse diagram to minimize the time used for order legality cheek at run time. And, third, its bottom-up depth-first parsing strategy minimizes the work of edge check and therefore saves a lot of processing time. We will first describe briefly each feature of our parser. Then, we will show the parsing algorithm and an example of parsing. The comparisons of our approach with other related works are also described. Finally, we give a conclusion and our future works.
منابع مشابه
Parsing Generalized ID/LP Grammars
The Generalized ID/LP (GIDLP) grammar formalism (Daniels and Meurers 2004a,b; Daniels 2005) was developed to serve as a processing backbone for linearization-HPSG grammars, separating the declaration of the recursive constituent structure from the declaration of word order domains. This paper shows that the key aspects of this formalism – the ability for grammar writers to explicitly declare wo...
متن کاملGIDLP: A Grammar Format For Linearization-based HPSG
Linearization-based HPSG theories are widely used for analyzing languages with relatively free constituent order. This paper introduces the Generalized ID/LP (GIDLP) grammar format, which supports a direct encoding of such theories, and discusses key aspects of a parser that makes use of the dominance, precedence, and linearization domain information explicitly encoded in this grammar format. W...
متن کاملA Grammar Formalism and Parser for Linearization-based HPSG
Linearization-based HPSG theories are widely used for analyzing languages with relatively free constituent order. This paper introduces the Generalized ID/LP (GIDLP) grammar format, which supports a direct encoding of such theories, and discusses key aspects of a parser that makes use of the dominance, precedence, and linearization domain information explicitly encoded in this grammar format. W...
متن کاملFormalization and Parsing of Typed Unification-Based ID/LP Grammars
This paper de nes uni cation based ID/LP grammars based on typed feature structures as nonterminals and proposes a variant of Earley's algorithm to decide whether a given input sentence is a member of the language generated by a particular typed uni cation ID/LP grammar. A solution to the problem of the nonlocal ow of information in uni cation ID/LP grammars as mentioned in Sei ert (1991) is in...
متن کاملOn The Complexity Of ID/LP Parsing
Modern linguistic theory attributes surface complexity to interacting subsystems of constraints. For instance, the ID/LP grammar formalism separates constraints on immediate dominance from those on linear order. An ID/LP parsing algorithm by Shieber shows how to use ID and LP constraints directly in language processing, without expanding them into an intermediate context-free "object grammar". ...
متن کامل